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The method of estimation in Scott and Wild [Biometnka 84 (1997) 57-71 and J. Statist. Plann. 
Inference 96 (2001) 3-27) uses a reparametrization of the profile likelihood that often reduces 
the computation times dramatically. Showing the efficiency of estimators for this method has 
been a challenging problem. In this paper, we try to solve the problem by investigating conditions 
under which the efficient score function and the efficient information matrix can be expressed 
in terms of the parameters in the reparametrized model. 
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1. Introduction 

In a series of papers, Scott and Wild [12, 13] developed methods of reparametrization of 
profile likelihood that can be applied to a variety of response-selective sampling designs. 
The advantage of the methods is that they often give us computationally efficient esti- 
mators. The (statistical) efficiency of these methods has been demonstrated in special 
cases by several authors. For example, Breslow, Robins and Wellncr [3] considered case- 
control sampling where either a case or control is selected by a randomization device 
with known selection probabilities, and the covariates of the resulting case or control are 
measured. In the case of two-phase, outcome-dependent sampling, Breslow, McNeney 
and Wcllner [2] applied the missing value theory of Robins, Rotnitzky and Zhao [11] 
and Robins, Hsieh and Newey [10]. Here, individuals in the population are selected at 
random and their status (e.g., case or control) is determined. Then, with a probability 
depending on their status, the covariates are measured. The unobserved covariates are 
treated as missing data. Lee and Hirose [8] used the profile likelihood method to derive 
a semi-parametric efficiency bound, and then showed that this bound coincides with the 
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asymptotic variance of the Scott-Wild estimator, hence demonstrating the efficiency of 
the estimator. 

In Lee and Hirose [8], it was demonstrated that, in the case of the Scott- Wild estimator, 
it is possible to reparametrize the least favorable submodel so that the efficient score 
function and the efficient information matrix can be expressed in terms of the parameters 
in the reparametrized model. 

The aim of this paper is to investigate conditions under which a reparametrization of 
the least favorable submodel yields an efficient estimation. 

We consider an S- vector of semi-parametric models ('Pi, . . . ,Vs) where, for each s = 

V s = {p 3 (x;p, V ): /3 £ Qp C R m , V £ Q v } 

is a probability model on the sample space X s with the parameter of interest /?, an 
m-dimensional parameter, and the nuisance parameter 77, which may be an infinite- 
dimensional parameter. Let (/^Oi^o) be the true value of (/5, 77) . We assume ®p is a com- 
pact set containing an open neighborhood of /3q in R m , and Q n is a convex set con- 
taining 7/0 in a Banach space B. We refer to the 5-vector of semi-parametric models 
(Vi, ...,Vs) as the multisample model. 

Under the model, we observe S independent samples X s x,...,X sna (s = 1 , . . . , S 1 ), 
where X a \, . . . ,X sris are independently and identically distributed (i.i.d.) according to 
the model V 8 . Let n = X)f=i n s- We assume the sample size proportions (ni/n, ... ,715/71) 
converge to weight probabilities (wi, . . . 

>(wi,...,ws), (1.1) 



ni ns_ 
j * • * ; 

71 71 



where w s > and 2s=i w s = 1. 

The log-likelihood for the multisample data is 

S n, 

l n (p, V ) =J2J2hgp s (X si ;p,r 1 ). (1.2) 

s=l i=l 

The paper is organized as follows: In the rest of Section 1, we give examples of semi- 
parametric multisample models. In Section 2, wc introduce the least favorable submodel 
in multisample models and in Section 3, we present the main result of conditions under 
which reparametrization gives efficient estimators in multisample models. In Section 4, 
we give a numerical example and use the result developed in the paper to show that the 
estimators in the example are efficient. 



1.1. Examples 

The idea of multisample data is familiar from elementary statistics; for example, the 
well-known two-sample t-test and the one-way ANOVA for comparing several means 
both involve multiple samples. Following are several semi-parametric examples. 
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Example 1 (Biased sampling model). Vardi [14] developed the method of estimation 
in the S-sample biased sampling model with known selection bias weight functions. The 
following setup and notation are from [6]. 

Suppose that non-negative weight functions wi(x), . . . ,u>s{x) are given and let G(x) 
be an unknown distribution function on a sample space X. Define the corresponding 
biased sampling model by 

P^G) = ^f (. = !,..,*), 

where <?(a;) = dG(x)/du with respect to Lebesgue measure /x and W s (G) = J x w s (x)dG(x). 
The S'-sample biased sampling model generates S independent samples 

X sl ,...,X sns ~ Ps (x;G) (s = 1,...,S). 

Gilbert, Lele and Vardi [5] considered an extension of this model that allows the weight 
function to depend on an unknown finite-dimensional parameter 9. 

Suppose a set of non-negative weight functions w\ (x, 9), . . . , u>s{x, 9) depend on 9. The 
semi-parametric biased sampling model is defined by 

n (r .n n _ w s (x,0)g{x) 

Ps[x,9,G) — Ws{djG) {s-l,...,b), 

where W S (9,G) = f x w s (x, 9) dG(x). Gilbert [4] provides a large sample theory of this 
example. 

The following examples are semi-parametric multisamplc models that all have the same 
underlying data- generating process on the sample space y x X, called the full data model, 

Q = {p(y, x- 9, G) - f(y\x; 9)g(x): 9 g 6, G E G}, 

where f(y\x; 9) is a conditional density of Y given X that depends on a finite dimensional 
parameter 9 and G(x) is an unspecified distribution function of X that is an infinite- 
dimensional nuisance parameter (g(x) is the density of G{x)). We assume the set is 
a compact set containing a neighborhood of the true value #o and Q is the set of all 
distribution functions of x. Unless stated otherwise, Y may be a discrete or continuous 
variable. 

Example 2 (Case-control study). We assume that Y takes values in {1,...,S'}. In 

a case-control study, due to the design, we do not observe a random sample from the full 
data model Q. Instead, for each s = 1, . . . , S, we observe n s -samplcs from the conditional 
distribution P(X\Y = s). By Bayes' theorem, the density of P(X\Y = s) is 

f(s\x;9)g(x) 



Jf(s\x;9)dG(x)- 



The case-control study is a special case of the semi-parametric biased sampling model of 
Example 1 with weight functions w s (x, 9) = f(s\x; 9) (s = 1, . . . , S). 
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Example 3 (Missing data). Instead of observing full data (Y,X) from the full data 
model Q for all individuals, we observe (Y,X) for no-samples and observe Y for ri\- 
samples. The result is the multisamplc data 

(xoi , 2/oi) , • • • , {x<3n , yon„ ) , yi 1 , • • ■ , VXm 

from a multisample model with densities 

Po{y, X] 6, g) = f(y\x; 9)g(x) 

and 

pi(y; 9,9) = J f{y\ x ; 6)g(x) dx. 

This example is not a special case of Example 1 . 

Example 4 (Standard stratified sampling and two-phase, outcome- dependent 
sampling). For a partition of the sample space y x X — \_} s=1 S s , let 



Q s (0,G)= J f(y\x;e)l (y>x)eSs dydG(x) 



be the probability of (Y,X) belonging to stratum S s . 

In standard stratified sampling, for each s = 1, . . . , S, a random sample of size n s is 
taken from the conditional distribution 

/ a f(y\ x ; )9(x)Uy,x)&s s 
Ps(y ' X ^' GH QMG) 

of (Y,X) given stratum S s . This is a more general version of the semi-parametric biased 
sampling model of Example 1 with weight functions w s (y,x,9) = f(y\x]9)l^ y x ) e s (s — 
1,...,5). 

Lawless, Kalbflcisch and Wild [7] discussed variations of the two-phase, outcome- 
dependent sampling design (the variable probability sampling designs (VPS1, VPS2) 
and the basic stratified sampling design (BSS)). For all sampling schemes (VPS1, VPS2 
and BSS), we have m s fully observed units and n s — m s subjects where the only infor- 
mation retained is the identity of the stratum, s = 1, . . . , S. The corresponding likelihood 
is 

L(&, G)= n2 6)g(x si )\ j [J Q s (6, G) n "~ ms 1 . (1.3) 

We interpret the observed data from two-phase, outcome-dependent sampling as data 
from a multisamplc model with densities 



Pi{y,x;0,G) = f(y\x;9)g(x) 
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and 

p 2 (s;6,G) = Q s (e,G). 
This example is not a special case of Example 1 . 

2. The least favorable submodel 

The log-likelihood function for a single observation in the multisamplc model is 

e(s,x;P,r)) = logp s {x;p,r)) (x e X s , s = 1, . . . , S). (2.1) 

The expectation with respect to the density p s (x;f3,r]) is denoted by E s ^ tV . 
We assume that there is a differcntiable function f3 ~ s- f)p such that 

fa = Vo (2.2) 

and 

i(8,x,0;Hp) (2.3) 



is the efficient score function (definition of the efficient score function in the multisample 
model is given in Appendix A). We call the model 

Ps(x;f3 1 fi l3 ) {(3e Qp,s = l,...,S), 

the least favorable submodel for the multisample model (Vi, . . . , Vs)- 

Remark 2. 1 . Under mild regularity conditions with the assumption that 

s 

fjp = axgmax^2w s E St p 0iVo {logp s (X; 0,ri)} 

exists for all (3 in some neighborhood of /?o, (2.3) is the efficient score function due to [9]. 
The definition of the least favorable submodel given above includes this as a special case 
but we do not limit our consideration only in this case. 

Our approach uses the method in Scott and Wild [12, 13] to find a candidate function f\p 
as well as Theorem A. 2 in Appendix A to verify that (2.3) with the candidate function 
gives the efficient score function. In the next example we illustrate this procedure. 

2.1. Example: Stratified sampling (continued) 

Stratified sampling was introduced in Example 4. 
Let 



Qs\x{x\6)= I f(y\x;0)l (ytX)eSs dy. 
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For each s = l,...,S, let F s q be the cumulative distribution function for the density 
p s {y, x\ Oq, go) at the true value (9o,go). The expected likelihood in the model is 

s s „ 

^2w s E sfi {logp s {y,x;9,g)} = ^2w s / logp s {y,x;6,g)dF s0 (y,x). 

For each 9, the method in Scott and Wild [12, 13] finds a maximizer g$(x) of 
log-likelihood under the assumption that the support of the distribution of X is 
finite; that is, SUPP(X) = {vi, . . . , v K }. Let (#i, . . . , g K ) = {g(v{), . . . ,g(v K )}- Then 
logff(x) and Q s (9,g) can be expressed as \ogg(x) = J2k=i lx=v k \ogg k and Q s (9,g) 
I Q s \x(x; 0)g{x) dx = Ylk=i Qs\x{vk]9)g k . 

To find the maximizer [g\, . . . ,gn) of the expected log- likelihood 

J2 W ° h°&Ps(y,x;9,g)dF s o^J2 Ws / { lo Sf(y\x;9)+\ogg(x)}dF s o-\ogQ s (9,g) 

8=1 ^ s=l 

at 0, differentiate this expression with respect to g k and set the derivative equal to zero, 

dF s o Q s \x{vk]9)\ 

di k h Ws J SPs (y ' x; ' 5) = 5 Ws i — » m^v I = °- 



s=l 



The solution g k to the equation is 

2^ s =i w sQs\x{Vk;9)/Q s (0,g) 
The form of the function motivates us to prove the following result 

Lemma 2.1 (The least favorable submodel). ForOGQ, let 

/o*(*) 



where 



and 



9e(x) = ^ s , (2.4) 

£f=i«>.Q.|jcM)/Q.(0) 



«/ A V- Q s |x(x;6> )g (x) 

/o(z) = 2^ — n ( @ n \ — ' (2 ' 5) 
^ Qa{9o,go) 



Qs(9)= Q slx (x;9)g e (x)dx (s = l,...,S). (2.6) 



\ogp s (y,x;9,g g ). (2.7) 





TTien </ie efficient score function is given by 

d 

t{s,y,x) = — 
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Proof. In Appendix B, we show that J2s=i Ws I ^°gPs(u, x ; 6, gg) dF s0 satisfies condi- 
tions (A.l) and (A. 2) in Theorem A. 2 in Appendix A so that the claim follows from this 
theorem. □ 

Remark 2.2. Note that equations (2.4) and (2.6) are consistent at 9 = 9q: (2.4) and (2.5) 
imply that gg (x) = g (x) if Q s (9q) = Q s (0o,go)- On the other hand, if gg (x) =g (x), we 
have Q s (6 ) = J Q s \x(x; )go{x) dx = Q s (&o,go) by (2.6). 



3. Main result 

Suppose there is a finite-dimensional, vector-valued function /3 —> qp such that the density 
for the least favorable submodel is of the form 

p.(x;P,f)p)=P*.faP,qfi) forall^ee^ (s = l,...,S), (3.1) 

where the function p*(x;/3, q) is twice continuously diffcrcntiablc with respect to (J3,q) 
and q is a finite-dimensional parameter. Further, suppose 

s 

Y,w s p*(x;p,q)dx = l for all (J3 t q)eG xD q , (3.2) 

s=l 

where 0,3 and D q are neighborhoods of f3g and q@ , respectively. Then the model 

p*(x;(3,q) (/3eQp,q£D q ,s = l,...,S), 

is called a reparametrized model for the least favorable submodel. The score functions 
for j3 and q in the reparametrized model are denoted by £±(s, x; (3, q) = (d/d/3) logp*(a;; f3, q) 
and £2(3, x; (3, q) = (d/dq)\ogp* s (x;f3,q), respectively 

Remark 3.1. In general, we may not have the condition 

p* s (x;P,q)dx = l for all {f3,q)e<d fj x D q (s = l,...,S). 

Therefore, there is no guarantee that each p*(x;f3,q) is a probability model. How- 
ever, (3.2) ensures that the linear combination J^fLi w sP* s {x] (3,q) acts like a probability 
model. This looks like a mixture model. The main differences between the multisamplc 
model and the mixture model arc data and asymptotics. For example, the log-likelihood 
and the information matrix in the mixture model are, respectively, 



5Z log l ^ w sPs(xi;P,q) 

and 



. s=l 



J V 2^ s=1 WsPs{x;p,q) / 



8 



Y. Hirose and A. Lee 



while the log-likelihood and the information matrix in the multisample model are given 
by, respectively, (1.2) and 

* [f {d/d{^q))p s {x-^q) \ m 



s=l 



Remark 3.2. Note that, since qp = f/p = 170 , we have p s (x; /3o, rjo) — Pt(x; f3o,q^ ) (s = 
1, . . . , S). Therefore, for the reparametrized model, the notation E s ^, s = 1, . . . , S is used 
for the expectations at the true value (/3o,<Zft>). 

For a measurable function f(s,x;/3,q), define the centering of f(s,x;/3,q) by 
f c (s, x; j3, q) = f(s, x; (3, q) - E sfi {f(s, x; (3 ,qp )}. 
The function f c (s,x;/3,q) is called the centered f(s,x;/3,q). 

Theorem 3.1 (Efficiency in a reparametrized model). We assume that the least 
favorable submodel and the corresponding reparametrized model are as in (2.2), (2.3), 
(3.1) and (3.2). Further, assume that 



0_ 

dq 



Y j w s E sfi {\ogp* s {x;P,q)} = Q for^eQp (3.3) 



9=9/3 s=l 



and X) s =i u 's^'s,o(^2^2 T ) * s non- singular. Then the efficient score function and the effi- 
cient information matrix in the original multisample model (V\, ■ ■ ■ , V s ) are given by 



-1 



, s=l ) \s=\ 



r( s ,x) = ^-<j^^ s ,o(ifif)^X]^,o(^f)[ i c 2 (3.4) 

and 

s 

r = Y,™ s E sfi (iiif) 



s=l 

( S 



(3.5) 



.S=l J L = l J I 8=1 



where i\{s,x]f3,q) and ^|(s, x; /3, q) are the centered score functions for (3 and q in the 
reparametrized model, respectively. 

Proof. By (2.3) and (3.1), the efficient score function is given by 



logp* s (x;P,qp) =i 1 (s,x;p ,qp ) +qj o £ 2 (s,x;l3 o ,q 0o ). (3.6) 



Reparametrization of the least favorable submodel 9 
Since E s ^ oVo {l*(s, X)} = (s = l,...,S), we have 
E 8,p oV o{h[s, x; P , qp a )} + qJ E s ^ oVo {£ 2 (s, x; P , q 0o )} = O (s = 1, . . . , S). (3.7) 
Therefore, (3.6) and (3.7) imply 

l*(s,x) =l c 1 (s,x;f3 ,q Po ) + qJJ 2 (s,x; p o ,q 0o ). 
By differentiating (3.2) with respect to q, for all (j3,q) £ Qp x D qi we have 



5 f ■ 

^2 Ws J ^{s,x;/3,q)p* s (x;/3,q)dx = 0. 



In particular, for all p £ ®p, 



S f ■ 

^2w s J i 2 (s,x;P,qp)p*(x;P,qp)dx = Q. 



By differentiating with respect to P at fio, 
S 



d_ 

dp 



0=0o 



£ 2 (s,x\P,qp) )p* s {x;p Q ,qp )dx 



■^w s j e 2 {s,x;(3 ,qf3 )[^p 



P* s {x;P,qp) Ax 



0=00 



By the first equality in (3.6), this equation is equivalent to 



V w s E sfi — i 2 (s, x- /3, qp) \ = - ^ w s E sfi (i 2 t T ). 

8=1 < " P 0=00 > 8=1 



By differentiating (3.3) with respect to p at /3q, we get 
S s 







d_ 

dp 



d_ 

0=0o d( l 



q=qf< s =i 



^2w s E s , {\ogp* s (x;P,q)} = ^2w s E sfi 

8=1 8=1 

s 

£ w s E s , (i 2 t T ) = - ^ ™ s £ s , (^r T ), 



dp 



(3.8) 



(3.9) 



i 2 {s,x,p,qp) 



= 00 



s=l 



s=l 



where we used (3.9) and E a<0 {£*(s,X)} = (s = 1, . . . , S). 

Therefore, the centered score function £ 2 (s, x\ /3q, q0 o ) and the efficient score function 



£*(s,x) are uncorrelated. Since £* 



o £ 2 (cf. (3.8)), by the projection theorem 
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(Theorem A.l in Appendix A), we have 



W2 



. s=l 



The rest of the claims follow by substituting this expression into (3.8). 



□ 



Remark 3.3. Under the usual regularity conditions, the solution ($ n ,Qn) to the system 
of the score equations, 

( S n, 

^^ii(s,jr si ;/3„,<y = o, 

S=l 1=1 

S n, 

^2^2^(s,X sl ;P n ,q n ) =0, 

^ S— 1 i—1 

is asymptotically distributed as 

n^0 n -p o )\ f/0\ - 
" 1/2 fe-<Zo)J IW' 



where 



/■ s 



f) 



E= < 



53 w. E s ,o (i c Jf) , ^ w a E afi (111 

s=l s=l 
S S 

5> s £ s , (^f),5> s i^o(W) 



8=1 



s=l 



Then the asymptotic variance of n 1 / 2 (/3 n — /3q) is given by (J*) 1 , where /* is the ef- 
ficient information for /3 given by (3.5) (cf. Bickcl et al. [1], page 28). In this case, the 
estimator j3 n is efficient. This efficiency of the estimator based on the reparamctrization 
is demonstrated in a numerical example given in Section 4. 

3.1. Example: Stratified sampling (continued) 

In this section, we illustrate the use of Theorem 3.1 to derive the expressions of the 
efficient score function and the efficient information bound in terms of the parameters in 
a reparametrized form of the least favorable submodel in the stratified sampling exam- 
ple. 

Lemma 2.1 gives the least favorable submodel with densities 

Ps(y,x;6,ge) = '- (s = l,...,5), 
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where g g is given by (2.4). By replacing Q(0) = (Qi(6), . . . , Q S -i(0), Qs(0)) with q = 
(qi, . . . , qs-i, 1), we consider a reparametrized model of the form 

P s {y,x;0,q) = ^ — '- (s = l,...,S), (3.10 

q s 

where 

g e , q (x) = — (3.11) 

Y,s=l w sQs\x{x]0)/q s 

with /q (x) given by (2.5). 
The true value of (6>, q) is 

la \ (a ( Qi( e o,go) Qs-i(Oo,9o) 1 

(oo,qo)= Vq, 7— j- a r,---, „ ,„ 

V VQs^o^o) Qs{0o,9a) 

Let -D g be some neighborhood of go- 

We will demonstrate that the conditions in Theorem 3.1 are satisfied, so that we can 
apply the theorem to identify the efficient score function and the efficient information 
matrix in the example. 

First, we will show that 

s 

^ w s / p*(y,x;9,q)dydx = 1 for all (6,q) G 9 X D q . 

8=1 ^ 

For any (6,q), since Q s \x(x;0) = J f(y\x; 0)l( y , s )eS s dy, 

2_^ w s I ZMS/^^gJdydx = 2_^ w s / dydx 

8=1 8=1 ^ 9S 

E/- Qs\x(x;0)ge, q (x) 
w s / dx 

8=1 J QS 

s 



= / 2^ Ws 9e, q {x)dx 

J 8=1 ^ 

/*(x)dx (by (3.11)) 



= 1. 

Second, we will show that for all 8 G 0q , 



9 



9g 



^ u;.,S^o{logp s (y, x; 0,q)} = 0. (3.12) 

<3=Q(0) s=l 
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For j = 1,...,S — 1 , the derivative is 
d S 

— V w s £: s ,o{logp s (y, x; (9, g)} 
dq z 



s=l 



2^ w s -E s ,o < log 2^ w s' 1- log q s 



dq 3 ^ 

J s—l 

5 f 



s' = l 



WjQj\x(x;0)/qj 
Y,s'=i w s'Qs'\x(x;0)/q s > J ?? 



/" w j Q j \ x {x]6)/q 2 ] Q s \x(x;0o)go{x) Wj 

hT' J Tfs^i w s'Q S '\x{x;e)/qs> Qs(&o,g ) x % 

^^^Ld^-I (by (2 .5)) 



Z) a '=l Ws'Q s <\x{x\0)/q S ' 1o 
"t( Qj\x{x]d)ge, q {x)dx-qj 



Therefore, at 9 = . . . 1) = (g^g, . . . , ^gl, 1), we have (3.12). 

By Theorem 3.1, the efficient score function and the efficient information matrix in 
the example are calculated by (3.4) and (3.5), respectively, where the score functions are 
given by 

• (d/dO)f(y\x;0) Z S 3 , =1 wAd/d0)Q a , lx (x;O)/q a , 

ii(s.y,x:0,q) = tt— — — s 5 ! 

f(y\x;0) T,Li^Q s 'ix(x;e)/q sl 

and £ 2 (s, y, x; 0, q) = {l 2 i (a, y, x;0,q),..., £ 2 (S-i) (s, y, x; 0,q)}, where 

i 2j (s,y, X ;6, q ) = ^{ _A U = 1 ,..., S -1). 

q 3 ^l^s'=l W s'Qs'\X\X]0)/q s ' ) 

Here verification of the non-singularity of X)f=i w sEs.o(&2&2 T ) is omitted. 



4. Numerical example: Stratified sampling with 
logistic regression 

Here we compare the maximum likelihood estimator (MLE) and estimators based on 
reparametrizations of the least favorable submodel, and demonstrate that the estimators 
based on reparametrizations arc statistically as efficient as the MLE and computationally 
more efficient. 
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Table 1. Leprosy data 



Age 


Scar = 





Scar = 


1 


Total 




Case 


Control 


Case 


Control 


Case 


Control 


2.5 


1 


21 


1 


31 


2 


55 


7.5 


11 


22 


14 


39 


25 


61 


12.5 


28 


23 


22 


27 


50 


50 


17.5 


16 


5 


28 


22 


44 


27 


22.5 


20 


9 


19 


12 


39 


21 


27.5 


36 


17 


11 


5 


47 


22 


32.5 


47 


21 


6 


3 


53 


24 


Total 










260 


260 



The data in the Table 1 were taken from Scott and Wild [12, 13] and were the case- 
control sampling part of the study of people under 35 in Northern Malawi. Cases are those 
with new cases of leprosy and controls are those without leprosy. The variable "Scar" 
indicates the presence or absence of a BCG vaccination scar (1 = present, = absent). 

Let x = (xi,X2) with x\ = Scar and x-i = 100(Agc4- 7.5) -2 . We consider a stratified 
sampling (case-control sampling) with the logistic regression model 

«*^» = ra^g to€ { 0.1>.«#> (4,, 

and the partition y x X = ({0} x X) U ({1} x X), where a £ R and ft G R 2 . In this case, 
with s = 0, 1, 



Q s (a,P,g) = J f(y = s\x;a,ft)g(x)dx 

and 

Q s \x(x,a,ft) = f(y = s\x;a,(3). 
From (3.10) and (3.11), a reparametrized model for the multisample model is 

*/ o \ (lo/<ls)f(y = s\x;9) 

p s {x;a,p, Pl ) = —j -/oW 

L S '=o w s' {Qo/qs')Qs'\x{x; a, p) 

= exp{s(q + logpi +x T f3)} 

wq + wi exp{(a 4- logpi + x T (3)} 

where po = qo /qa = 1 and p\ = qo jq\ . The parameters in the model are not identifi- 
able and the parameters a and p\ cannot be estimated separately. By the proof in the 
stratified sampling example in Section 3.1, the efficient information bound for (a, ft) 
is given by (3.5) in Theorem 3.1 with £±(s, x;a,ft,p±) = {d/d(a, ft)}logp*(x;a, ft, p\) 
and £2(8, x; a, ft, pi) = {d/dpi}\og,p*(x\a,ft,px). The estimator (a,ft,pi) based on this 
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non-identifiable reparametrization is the maximizer of the log-likelihood £ n (a, P, pi) = 
ELo E"=i \ogp* s (x st ; a, P, pi). 

To gain identifiability of the parameters, we let a* = a + logpi, and the model is 
further reparamctrized as 

*/ * m cxp{g(q* + x T P)} 

p 3 {x;a ,0)= ■ t— — — Toyrfoix). 

wo + wi exp{ (a* + x 1 p) \ 

If we treat the parameters a and g in the original model as nuisance parameters, 
Theorem 3.1 gives the efficient information bound for an estimator of the param- 
eter /3: it is (3.5) in Theorem 3.1 with £\(s, x; a*, (i) — (d/dp) logp*(x; a*, (i) and 
£2(s,x;a*,/3) = (d/da*)logp* s (x;a* , (3). The proof is similar to the one for the strat- 
ified sampling example given above and, therefore, we omit it. The estimator (&*,/?) 
based on this identifiable reparametrization is the maximizer of the log-likelihood for the 

data e n (a*,P)= ELo ££i 1o §^ \a*,P). 

If X takes values in {v\, . . . , vk }, let §k = ff(ufc), k = l,...,K. Then the log-likelihood 
for a single observation in the model can be written as 

K K 

logp s (x; a, (3, g) = log f(y = s\x; a, (3) + l{ x = Vk }\ogg k - log^ f(y = s\v k ;a, (3)g k . 

k=l fc=l 

The MLE (&,0,g), where g= {g\, . . . ,c/k ), is the maximizer of the log-likelihood 

l n (a,p,g) = J2s=oYH=i l °SPs(x s i;a,P,g)- 

For each case (non-identifiable reparametrization, identifiable reparametrization and 
maximum likelihood), let 6\ be the parameter of interest and 62 be the nuisance param- 
eter. Then an estimated variance of the estimator (of the parameter of interest) is given 
by the formula (3.5) except that each J2 S w sE s ,o(£i^ T ) = 1,2) is replaced with the 
corresponding second-degree partial derivative — n~ 1 (d 2 /d9id8j)£ n . 

Estimates of regression coefficients and their standard error (SE) in these mod- 
els are given in Table 2. Note that in the maximum likelihood and non-identifiable 
reparametrization, the intercept parameter is not identifiable. Its estimates and the cor- 
responding SE are unreliable and unstable. Therefore, we do not look at estimates of 



Table 2. Model fitting results for the leprosy data 



Maximum Reparametrization 





likelihood 




Not identifiable 


Identifiable 






Coef 


SE 


Coef 


SE 


Coef 


SE 


Intercept 


1.55720 


94.52766 


0.61334 


8388784 






Age 


-0.30205 


0.19737 


-0.30211 


0.19737 


-0.30215 


0.19736 


Scar 


-4.30992 


0.57891 


-4.31017 


0.57892 


-4.30988 


0.57889 


Computation time 


43.61 sec 




2.80 sec 




2.44 sec 
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Table 3. Relative efficiency with respect to the maximum likelihood 





Reparametrization 




Not identifiable 


Identifiable 


Age 


0.99997 


0.99992 


Scar 


1.00005 


0.99994 


Computation time 


0.06421 


0.05595 



the intercept parameter in these models. The estimated coefficients of "Age" and "Scar" 
and their SE are very similar to each other among these models. This is consistent with 
the prediction made by Theorem 3.1 that reparametrization gives the semi-parametric 
efficiency bound that is achieved by the MLE. 

Table 3 gives the relative efficiency of estimates in non-identifiable reparametrization 
and identifiable reparametrization with respect to the maximum likelihood, along with 
the relative efficiency in computation times (which is defined as the ratio of the cor- 
responding computation times). The table indicates that these reparamctrizations are 
statistically as efficient as, and computationally more efficient than, the method of max- 
imum likelihood. 



5. Discussion 

Theorem 3.1 gives conditions under which the efficient score function and the efficient 
information matrix can be expressed in terms of the parameters in the reparametrized 
model, namely (3.4) and (3.5), respectively. In Section 4, we demonstrated that Theo- 
rem 3.1 can be used to show the efficiency of estimators based on non- identifiable and 
identifiable reparamctrizations in the logistic regression model, and that these estimators 
are computationally more efficient than the MLE. The results of the paper can be used 
to find a reparametrization of the least favorable submodel (or profile likelihood) that 
gives statistically and computationally efficient estimators in multisample models. 



Appendix A 

We define the Hilbert space, projection and the efficient score function. 



A.l. Hilbert space and the projection 

Let % be the Hilbert space of m-dimcnsional measurable functions with zero mean and 
finite variance: 

U=S.^(s,x): E s ,oW = (s = l,...,S),J2^E sfi (^ T ^)<oo 

I s=l 
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The covariance of ip, <fi 6 H is defined by cov(ip, 4>) = Y^ s =i w sE s ,o(ip(f> T )- We say i/j and </> 
are uncorrelated if cov(tjj, <j>) = 0. For a set of functions Q in is the set of all 
functions ip G H with cov (ip, 0) = for all <j) €Q. The projection n(^)|(/) of V' £ "H onto 
a closed subspace Q is characterized by 

n(V>|£)e£? and ip-ii(ip\g) e^- 1 . 

For an arbitrary Banach space £>, let 0* be its dual. Let A : B — > T-L be a bounded linear 
operator and ip €"H. The adjoint operator A T : H — > B* of A : B — > % is defined by the 
map 

s 

(A T ^)(b) = (Ab, if,) =Y,™sE sfi {(Ab)i, T }, b 6 6. 

8=1 

Suppose that (A T A)~ 1 exists and let ip £ %. By the projection theorem for an operator 
equation, 

U(ip\A(B)) = A(A T A)~ l A T ip 



is a projection of ?/> onto the closure A(B) of the range of A. 
A. 2. The projection theorem 

Theorem A.l (The projection theorem). Suppose 4>(s,x) is an I -dimensional vector 
of measurable functions such that 

(1) for s = l,,.. ,S, E sfi (0)=Q: 

(2) £f =1 i^.oO^V) < oo; 

( 3 ) "E s =i w s -Bs,o(# T )} _1 exists. 

Let Q = {A<p: A S i? mxi } 6e £/ie closed subspace ofH generated by <fi. Then, for each 
the projection of ip onto the closed subspace Q is given by 

AW) = ^8^8,0 (V0 T )| |^^8-S8,0 (# T )| 4>- 

Proof. The proof is similar to the one for the standard case. □ 



A. 3. The efficient score function 

Here, we give the definition of the efficient score function in a multisample model. 

We assume the log-likelihood function for a single observation t(s,x; /3,rj) (defined 
by (2.1)) is continuously differentiable with respect to (5 for all f3 6 Qp and Hadamard 
differentiable with respect to 77 for all 77 G 0^. The score function £(s,x;f5,rf) for /3 and 
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the score operator A(s,x; /3,r)) for 77 in the multisamplc model are the derivatives of the 
log-likelihood function with respect to /3 and 77, respectively. 

The tangent space for 77 is the closure A(B) of range of the score operator A for rj. 

The uncorrelated complement of the score function £p with respect to the tangent 
space for 77, 

t =i-tt(i\A(Bj), 
is called the efficient score function in the multisample model (Vi, ■ ■ ■ ,Vs)- 



A. 4. Theorem to identify the efficient score function 

To verify that the function given by (2.3) is the efficient score function, the following 
theorem may be useful. 

Theorem A.2. A path t —> r\t is a continuously differ entiable map in a neighborhood 
of such that r\t=Q = Vo ■ Define at = rjt — rjo . If (3 —> f/p is a differ entiable function such 
that 



and, for each /3 S @p, and for each path r/ t , 

S 



d_ 

dt 



y~] w s E S! o{logp s (x; /3, fjp + q t )} = 0, 
t=o s=1 



then the function 



\ogp s {x\P,fip) 



is the efficient score function. 
Proof. Condition (A.2) implies that 



(A.l) 



(A.2) 



(A.3) 



d_ 

dt 



= 00 



^2w s E s _ {\ogp s (x;(3,fip + a t )} 

t=0 g=i 



t=0 g= i 

By differentiating the identity 

S 



( 



logp s (x;f3,f}p + a t ] 



VP + a t) \p s (x; 0, i)p + a t ) dx = 



(A.4) 
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with respect to t at t = and /3 = /3q , we get 

S 



-I 



0,y9=/9o a =l ^ " 

S 



rfo + Qtf) p(x; /3, 77,3 + a t ) da; 



s=i 



£*{s,x) 



\ogp s (x;P ,r]t) 



(we used (A. 3) and 
V/h + a t = vt by (A.1)) (A.5) 



at 



t=0 s= i 



f 5 



5 r / a 

s— 1 ^ 



logp s (x; p ,r)t) 



(by (A.4)). 



Let c S R m be arbitrary. Then, it follows from (A.5) that the product c'£*(s,x) is orthog- 
onal to the nuisance tangent space , which is the closed linear span of score functions 
of the form (j)(s,x) = Jjli_ologp s (a;; fioiVt)- By (A. 3) with (A.l) , we have 



t(s,x) 



d_ 

dp 



logp s (x;P,r] ) 



ip(s,x)-ij)(s,x), 



d_ 

d/3 



\ogp s {x;l3 ,fip) 



P=Po 



where £p(s,x) = jp\p=p \ogp s (x; f3, f] ) and ip{s,x) = —^\p =Po 'iogp a (x',0o,fjp). Finally, 
c'£*(s,x) = c'£p(s,x) — c?x/}(s,x) is orthogonal to the nuisance tangent space and 
c'ip{s,x) € V v implies that cfijj(s,x) is the orthogonal projection of c'£p(s,x) onto the 
nuisance tangent space V v . Since c £ R m is arbitrary, the function £*(s, x) given by (A. 3) 
is the efficient score function. □ 



Appendix B 

B.l. Proof of Lemma 2.1 

Proof. We show that X)f=i w s J ^°gPs(y, x; (9, §g) dF s o satishes conditions (A.l) and (A. 2) 
in Theorem A. 2 in Appendix A so that the claim follows from this theorem. 

Condition (A.l) is verified in Remark 2.2. Now we verify (A. 2). Let gt(x) be a path 
in the space of density functions with gt—o{x) — go(x). Define at(x) = gt{x) — go(x) and 
write a' (x) = (d/dt)\t=oOtt{x). Then 



0_ 

dt 



t=o s= l 



w s / \ogp s {y,x;9,g g +a t )dF s0 
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d_ 

dt 

d_ 

dt 



t=o s= i 



t=0 



log{g e (x) + a t (x)} dF Sj0 - logQ s (6,ge + a t ) 

s 

I \og{g g (x) +a t (x)}fo(x)dx - \ogQ s (0,g e + a t ) 



.(*) 



by (2.4) and (2.5). 



□ 
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